Features of Nearest Neighbors Distances in High-Dimensional Space

نویسنده

  • Marcel Ji
چکیده

Methods of nearest neighbors are essential in wide range of applications where it is necessary to estimate probability density (e.g. Bayes’s classifier, problems of searching in large databases). This paper contemplates on features of distribution of nearest neighbors’ distances in high-dimensional spaces. It shows that for uniform distribution of points in n-dimensional Euclidean space the distribution of the distance of the i-th nearest neighbor to the n-power has Erlang distribution. A power approximation of the newly introduced probability distribution mapping function of distances of nearest neighbors in the form of suitable power of the distance is presented. An influence of the boundary effect is also discussed. Also presented is way to state distribution mapping exponent q for a probability density estimation including boundary effect in highdimensional spaces.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset

Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is ...

متن کامل

RNN (Reverse Nearest Neighbour) in Unproven Reserve Based Outlier Discovery

Outlier detection refers to task of identifying patterns. They don’t conform establish regular behavior. Outlier detection in highdimensional data presents various challenges resulting from the “curse of dimensionality”. The current view is that distance concentration that is tendency of distances in high-dimensional data to become in discernible making distance-based methods label all points a...

متن کامل

Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations

The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...

متن کامل

Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empiri...

متن کامل

Using Triangle Inequality to Efficiently Process Continuous Queries on High-Dimensional Streaming Time Series

In many applications, it is important to quickly find, from a database of patterns, the nearest neighbors of highdimensional query points that come into the system in a streaming form. Treating each query point as a separate one is inefficient. Consecutive query points are often neighbors in the high-dimensional space, and intermediate results in the processing of one query should help the proc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004